AllLife Credit Card Customer Segmentation

Customer Relationship Management (CRM) plays a crucial role in the strategy of marketing by providing organizations with the business intelligence for building, managing, and developing valuable long-term customer relationships. Many businesses have come to realize the significance of CRM and the application of technical expertise to achieve competitive advantage.

This problem centers on AllLife Bank's credit card customer base, and the bank would like to accomplish two goals. First, AllLife Bank seeks to improve its market share of credit card customers. More specifically, the marketing team proposes to run personalized campaigns to target new customers, as well as to generate more revenue from existing customers. Second, AllLife Bank would like to upgrade its service delivery model to ensure more timely problem resolution given market feedback that customers negatively perceive its credit card support services.

In order to advise AllLife Bank, we will use some of the most useful techniques in business analytics for the analysis of consumer behavior and categorization: customer segmentation. We will use clustering techniques, grouping together in homogeneous clusters those customers with similar means, end, and behavior. Customer segmentation should allow us to advise AllLife Bank marketing strategies to identify or reveal distinct groups of customers who think and function differently and follow varied approaches in their spending and purchasing habits. Clustering techniques should reveal internally homogeneous and externally heterogeneous groups.

Since customers vary in terms of behavior, needs, wants, and characteristics, our main goal in using clustering techniques will be to identify different customer types and segment the customer base into clusters of similar profiles. These segmented profiles will allow us to develop target marketing that can be executed more efficiently for AllLife Bank's two goals.

Data Set Attribute Information:

Set forth below is the data of various customers of AllLife Bank for our analysis, including their credit limit, the total number of credit cards the customer has, and the different channels through which customer has contacted AllLife Bank for any queries, including the means by which they engaged AllLife Bank for support services (visiting the bank, online, and through a call).

  • Customer Key: identifier for the customer
  • Average Credit Limit: average credit limit across all the credit cards
  • Total Credit Cards: total number of credit cards
  • Total Visits Bank: total number of bank visits
  • Total Visits Online: total number of online visits
  • Total Calls Made: total number of calls made by the customer

The Data: Exploratory Analysis, Preprocessing, Feature Engineering, and Preparation for Models

Prior to using the data to train the machine learning models, we must first analyze and preprocess the data. I will be on the lookout for certain issues in my data analysis, including but not limited to, the following:

  • Is there missing data? if so, then we will need to decide what to do with this missing data.
  • Is there duplicative data? If so, then we should delete as they would not inform our models.
  • Are the features in a uniform scale? If not then they should be normalised to the same range to avoid the training process to be dominated by one or few features with large magnitude. This is done for each of the input features. (It is important to note, in normalization procedures, once the models have been trained using normalised data.)
  • While not requested explicitly in our problem set, we should consider consider feature engineering, whether in terms of creating new variables or dropping others. This could lead us to use the new, more predictive feature in place of prior, independent features. This technique increases the input features and could result in better trained models; provided, however, there is a potential downside that it results in the requirement of more computational effort and a higher degree of potential overfitting.
  • Is there any feature(s) in the data that need to be dropped that could improve the machine learning models.

As specifically required by the assignment, we will conduct the following analysis: (i) typical univariate analysis, including but not limited to analysis of the customer variables' distributions/tails, missing values, outliers, duplicates, and (ii) perform exploratory data analysis, creating visulations to explore data (10 marks). A lot of this analysis will be accomplished by an initial review of a panda profile report. We will also properly comment on codes, provide explanations of the steps taken, and provide insights from our data analysis.

In [1]:
import warnings
warnings.filterwarnings('ignore')
In [2]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline

import itertools
import numpy as np

import os
from sklearn.cluster import KMeans
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import MinMaxScaler

from sklearn import metrics
from sklearn.metrics import silhouette_score

from sklearn.cluster import AgglomerativeClustering
from scipy.cluster.hierarchy import dendrogram, linkage

from sklearn import cluster 
from sklearn.cluster import SpectralClustering
In [3]:
read_file = pd.read_excel ('Credit Card Customer Data.xlsx')
read_file.to_csv ('Credit Card Customer Data.csv', index = None, header=True)
In [4]:
df = pd.read_csv('Credit Card Customer Data.csv')
In [5]:
df_copy=df.copy()
df.head()
Out[5]:
Sl_No Customer Key Avg_Credit_Limit Total_Credit_Cards Total_visits_bank Total_visits_online Total_calls_made
0 1 87073 100000 2 1 1 0
1 2 38414 50000 3 0 10 9
2 3 17341 50000 7 1 3 4
3 4 40496 30000 5 1 1 4
4 5 47437 100000 6 0 12 3
In [6]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 660 entries, 0 to 659
Data columns (total 7 columns):
 #   Column               Non-Null Count  Dtype
---  ------               --------------  -----
 0   Sl_No                660 non-null    int64
 1   Customer Key         660 non-null    int64
 2   Avg_Credit_Limit     660 non-null    int64
 3   Total_Credit_Cards   660 non-null    int64
 4   Total_visits_bank    660 non-null    int64
 5   Total_visits_online  660 non-null    int64
 6   Total_calls_made     660 non-null    int64
dtypes: int64(7)
memory usage: 36.2 KB
In [7]:
df.isnull().sum()
Out[7]:
Sl_No                  0
Customer Key           0
Avg_Credit_Limit       0
Total_Credit_Cards     0
Total_visits_bank      0
Total_visits_online    0
Total_calls_made       0
dtype: int64
In [8]:
#Dropping any duplicates
df = df.drop_duplicates()

df.shape
Out[8]:
(660, 7)
In [9]:
df.describe().transpose()
Out[9]:
count mean std min 25% 50% 75% max
Sl_No 660.0 330.500000 190.669872 1.0 165.75 330.5 495.25 660.0
Customer Key 660.0 55141.443939 25627.772200 11265.0 33825.25 53874.5 77202.50 99843.0
Avg_Credit_Limit 660.0 34574.242424 37625.487804 3000.0 10000.00 18000.0 48000.00 200000.0
Total_Credit_Cards 660.0 4.706061 2.167835 1.0 3.00 5.0 6.00 10.0
Total_visits_bank 660.0 2.403030 1.631813 0.0 1.00 2.0 4.00 5.0
Total_visits_online 660.0 2.606061 2.935724 0.0 1.00 2.0 4.00 15.0
Total_calls_made 660.0 3.583333 2.865317 0.0 1.00 3.0 5.00 10.0

Initial Insights

As can be seen from an initial review of the data, all the sample data is filled and there is no need to determine how to handle missing data. The sample data seems to also have appropriate data types for our analysis. There were no duplicate lines of data. I now want to explore the data through univariate and bivariate analysis.

As always for me, I find a review of the data in a panda profile report helpful.

In [10]:
from pandas_profiling import ProfileReport
df.profile_report()



Out[10]:

Initial Observations: We notice that there are 5 Customer Key numbers that have two entries, so we want to look at this data more closely. There are also a significant number of zeroes for some of the data, but this is not surprising in terms of some customers displaying certain behaviors while not displaying other behaviors.

In [11]:
df[df["Customer Key"] == 47437]
Out[11]:
Sl_No Customer Key Avg_Credit_Limit Total_Credit_Cards Total_visits_bank Total_visits_online Total_calls_made
4 5 47437 100000 6 0 12 3
332 333 47437 17000 7 3 1 0
In [12]:
df[df["Customer Key"] == 37252]
Out[12]:
Sl_No Customer Key Avg_Credit_Limit Total_Credit_Cards Total_visits_bank Total_visits_online Total_calls_made
48 49 37252 6000 4 0 2 8
432 433 37252 59000 6 2 1 2
In [13]:
df[df["Customer Key"] == 97935]
Out[13]:
Sl_No Customer Key Avg_Credit_Limit Total_Credit_Cards Total_visits_bank Total_visits_online Total_calls_made
104 105 97935 17000 2 1 2 10
632 633 97935 187000 7 1 7 0
In [14]:
df[df["Customer Key"] == 96929]
Out[14]:
Sl_No Customer Key Avg_Credit_Limit Total_Credit_Cards Total_visits_bank Total_visits_online Total_calls_made
391 392 96929 13000 4 5 0 0
398 399 96929 67000 6 2 2 2
In [15]:
df[df["Customer Key"] == 50706]
Out[15]:
Sl_No Customer Key Avg_Credit_Limit Total_Credit_Cards Total_visits_bank Total_visits_online Total_calls_made
411 412 50706 44000 4 5 0 2
541 542 50706 60000 7 5 2 2

Additional Observations

  • Customer Key is mostly unique (over 99%), except for 5 entries (Customer Key numbers 47437, 37252, 97935, 96929, and 50706). We reviewed these repeating Customer Key entries to determine whether they are duplicates. While not duplicates, it seems that these are the same customers with a change in circumstances over time (mostly, later Sl_No and higher credit limits indicates a longer-term customer with changing financial conditions over time); provided, Sl_No 5 is the exception in that the customer seems to have had their credit limit reduced in the later Sl_No.
  • We will drop the (suspected) earlier customer entries for purposes of our analysis and machine learning techniques.
  • Variable Sl_No is completely unique and can be dropped for purposes of our analysis and machine learning techniques.
  • Once we drop the 5 entries for prior customer profiles (as discussed above), the Customer Key will contain completely unique values, and thus we can drop this variable for purposes of our analysis and machine learning techniques.
  • Given the limited number of credit cards, we may want to create new categories based on the number of credit cards, as this could be a significant differentiating attribute of bank customers. This would constitute our "feature engineering."
  • Given the wide ranges of the values in certain attributes, and the interquartile ranges, we suspect there may be outliers to be addressed before building our machine learning models.
  • Given the different measurements of the data, together with their distributions, we will need to normalize the data when we preprocess it.

In order for ease of reference, after addressing the issues outlined above for the Sl_No and Customer Key variables, we will create a pairplot and heatmap to explore and analyze the data.

In [16]:
# Getting rid of the earlier customer records
df = df[df["Sl_No"] != 5]
df = df[df["Sl_No"] != 49]
df = df[df["Sl_No"] != 105]
df = df[df["Sl_No"] != 392]
df = df[df["Sl_No"] != 412]
In [17]:
#Convert the number of credit cards held by customer into dummy variables 
#(This is subject to business knowledge, and number of credit cards is usually important in banking.)

one_hot = pd.get_dummies(df['Total_Credit_Cards'])
one_hot = one_hot.add_prefix('CC_')

# merge in main data frame
df = df.join(one_hot)
df.head()
Out[17]:
Sl_No Customer Key Avg_Credit_Limit Total_Credit_Cards Total_visits_bank Total_visits_online Total_calls_made CC_1 CC_2 CC_3 CC_4 CC_5 CC_6 CC_7 CC_8 CC_9 CC_10
0 1 87073 100000 2 1 1 0 0 1 0 0 0 0 0 0 0 0
1 2 38414 50000 3 0 10 9 0 0 1 0 0 0 0 0 0 0
2 3 17341 50000 7 1 3 4 0 0 0 0 0 0 1 0 0 0
3 4 40496 30000 5 1 1 4 0 0 0 0 1 0 0 0 0 0
5 6 58634 20000 3 0 1 8 0 0 1 0 0 0 0 0 0 0
In [18]:
#Dropping columns Sl_No and Cutomer Key. Also Dropping Total_Credit_Cards since it is covered by other variables
df = df.drop(columns='Sl_No')
df = df.drop(columns='Total_Credit_Cards')
df = df.drop(columns='Customer Key')
df.head()
Out[18]:
Avg_Credit_Limit Total_visits_bank Total_visits_online Total_calls_made CC_1 CC_2 CC_3 CC_4 CC_5 CC_6 CC_7 CC_8 CC_9 CC_10
0 100000 1 1 0 0 1 0 0 0 0 0 0 0 0
1 50000 0 10 9 0 0 1 0 0 0 0 0 0 0
2 50000 1 3 4 0 0 0 0 0 0 1 0 0 0
3 30000 1 1 4 0 0 0 0 1 0 0 0 0 0
5 20000 0 1 8 0 0 1 0 0 0 0 0 0 0

Let's take another look at our treated variables and their correlations.

In [19]:
sns.pairplot(df)
Out[19]:
<seaborn.axisgrid.PairGrid at 0x19ae229cfd0>
In [20]:
plt.figure(figsize=(10,8))

sns.heatmap(df.corr(),
            annot=True,
            linewidths=.5,
            center=0,
            cbar=False,
            cmap="YlGnBu")

plt.show()

Observations of Correlations: With the assumption that correlation values > 0.3 to be significant among the variables, one can observe the following:

  • Ave_Credit_Limit is moderately correlated with Total_Visits_Online, but moderately negatively correlated to Total_Calls Made.
  • By splitting the Total_Credit_Cards into individual variables based on number of credit cards, we see a clear difference in the correlations to modes of interacting with the bank (visits, online, and calls).
  • Total_Visits_Bank are moderately negatively correlated to Total_Visits_Online and Total_Calls_Made.
  • While we created dummy variables for the number of credit cards held, it is clear that groups with 1-3 credit cards show similar correlations, 4-7 credit cards show similar correlations, and 8-10 credit cards show similar correlations. Based on these results, we are going to create just three groups of card holders (1-3, 4-7, 8-10) to simplify my model and reduce likelihood of overfitting.

Now we will review the data for outliers.

In [21]:
df["1-3CCards"] = df["CC_1"] + df["CC_2"] + df["CC_3"]
df["4-7CCards"] = df["CC_4"] + df["CC_5"] + df["CC_6"] + df["CC_7"]
df["8-10CCards"] = df["CC_8"] + df["CC_9"] + df["CC_10"]
In [22]:
df.head()
Out[22]:
Avg_Credit_Limit Total_visits_bank Total_visits_online Total_calls_made CC_1 CC_2 CC_3 CC_4 CC_5 CC_6 CC_7 CC_8 CC_9 CC_10 1-3CCards 4-7CCards 8-10CCards
0 100000 1 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0
1 50000 0 10 9 0 0 1 0 0 0 0 0 0 0 1 0 0
2 50000 1 3 4 0 0 0 0 0 0 1 0 0 0 0 1 0
3 30000 1 1 4 0 0 0 0 1 0 0 0 0 0 0 1 0
5 20000 0 1 8 0 0 1 0 0 0 0 0 0 0 1 0 0
In [23]:
df = df.drop(columns='CC_1')
df = df.drop(columns='CC_2')
df = df.drop(columns='CC_3')
df = df.drop(columns='CC_4')
df = df.drop(columns='CC_5')
df = df.drop(columns='CC_6')
df = df.drop(columns='CC_7')
df = df.drop(columns='CC_8')
df = df.drop(columns='CC_9')
df = df.drop(columns='CC_10')
In [24]:
df.head()
Out[24]:
Avg_Credit_Limit Total_visits_bank Total_visits_online Total_calls_made 1-3CCards 4-7CCards 8-10CCards
0 100000 1 1 0 1 0 0
1 50000 0 10 9 1 0 0
2 50000 1 3 4 0 1 0
3 30000 1 1 4 0 1 0
5 20000 0 1 8 1 0 0
In [25]:
plt.figure(figsize=(12,6))
sns.boxplot(data=df, orient="h", palette="Set2", dodge=False)
Out[25]:
<matplotlib.axes._subplots.AxesSubplot at 0x19ae67928b0>

Action Note: Treat Outliers. We will treat the Avg_Credit_Limit for outliers.

In [26]:
# Let us take logaritmic transform for Avg_Credit_Limit to remove outliers
df['Avg_Credit_Limit'] = np.log(df['Avg_Credit_Limit'])
In [27]:
#Confirming treatment of outliers
sns.boxplot(df['Avg_Credit_Limit'])
Out[27]:
<matplotlib.axes._subplots.AxesSubplot at 0x19ae6380b80>

As stated above, we will normalize our data set to prepare the data for our machine learning tools.

In [28]:
from scipy.stats import zscore
df_std =df.apply(zscore)
In [29]:
df_hc = df_std
df_std.head()
Out[29]:
Avg_Credit_Limit Total_visits_bank Total_visits_online Total_calls_made 1-3CCards 4-7CCards 8-10CCards
0 1.631391 -0.864813 -0.548851 -1.252966 1.656157 -1.425625 -0.258409
1 0.885986 -1.480522 2.535493 1.900849 1.656157 -1.425625 -0.258409
2 0.885986 -0.864813 0.136559 0.148730 -0.603807 0.701447 -0.258409
3 0.336649 -0.864813 -0.548851 0.148730 -0.603807 0.701447 -0.258409
5 -0.099385 -1.480522 -0.548851 1.550425 1.656157 -1.425625 -0.258409

K-Means Clustering

K-Means is one of the most widely used clustering algorithms, and is both simple and efficient. The aim of K-Means algorithm is to divide M points in N dimensions into K clusters (assume k centroids) fixed a priori. These centroids should be placed in a wise fashion so that the results are optimal which otherwise can differ if locations of the centroids change. So, they should be placed as far as possible from each other. Each data point is then taken and associated with the nearest centroid until no data points are pending. This way an early grouping is done and at this point, k new centroids have to be recalculated as these will be the centers of the clusters formed earlier. After having calculated these centroids, the data points are then allocated to the clusters to the nearest centroids. In this iteration, the centroids change their position stepwise until no further modifications have to be done and the location of the centroids remain intact.

The K-Means algorithm is relatively simple. The K cluster points, which will be the centroids, are placed in the space among the data points. Each data point is assigned to the centroid for which the distance is the least. After each data object has been assigned, centroids of the new groups are re-calculated. The above two steps are repeated until the movement of the centroid ceases. This means that the objective function of having the least squared error is completed and it cannot be improved further. Hence, we get K clusters as a result.

K-Means algorithm aims at minimizing an objective function, which is measured by the squared-error. It is an indicator of the distance of the data points from their respective cluster centers. The process in this algorithm always terminates but the relevance or the optimal configuration cannot be guaranteed even when the condition on the objective function is met. The algorithm is also sensitive to the selection of the initial random cluster centers.

Metrics. Sum of Squares within Cluster (SSWC) is simple and the most widely used criterion to gauge the validity ofthe clusters. Smaller values of SSWC mean better clusters. We will review this measurement with the use of the elbow plot. Obtaining a Silhouette score is another measurement of validity. We will also seek to visualize the data in an effort to understand the clusters of segmented customers.

In [30]:
Sum_of_squared_distances = []
K = range(1,7)
for k in K:
    km = KMeans(n_clusters=k)
    km = km.fit(df_std)
    Sum_of_squared_distances.append(km.inertia_)
In [31]:
plt.plot(K, Sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')
plt.show()

In the plot above the elbow is at k=3 indicating the optimal k for this dataset is 3.

Now let us review the Silhouette scores.

In [32]:
silhouette_scores = [] 

for n_cluster in range(2, 7):
    silhouette_scores.append( 
        silhouette_score(df_std, KMeans(n_clusters = n_cluster).fit_predict(df_std))) 
    
# Plotting a bar graph to compare the results 
k = [2, 3, 4, 5, 6] 
plt.bar(k, silhouette_scores) 
plt.xlabel('Number of clusters', fontsize = 10) 
plt.ylabel('Silhouette Score', fontsize = 10) 
plt.show()

The Silhouette score confirms that the optimal number of clusters is 3.

In [33]:
#Setting the value of k=3 for our K-Means Clustering
kmeans = KMeans(n_clusters=3, n_init = 7, random_state=2345)
In [34]:
kmeans.fit(df_std)
Out[34]:
KMeans(n_clusters=3, n_init=7, random_state=2345)
In [35]:
centroids = kmeans.cluster_centers_
In [36]:
centroids
Out[36]:
array([[ 0.08709099,  0.48408562, -0.43382833, -0.35878099, -0.60380736,
         0.7014466 , -0.25840906],
       [-0.72136561, -0.92110633,  0.33434849,  1.11990448,  1.65615734,
        -1.42562527, -0.25840906],
       [ 1.96343118, -1.09741417,  2.89342976, -0.88696788, -0.60380736,
        -1.23655221,  3.50287842]])
In [37]:
#Calculating the centroids for the columns to profile
centroid_df = pd.DataFrame(centroids, columns = list(df_std) )
In [38]:
print(centroid_df)
   Avg_Credit_Limit  Total_visits_bank  Total_visits_online  Total_calls_made  \
0          0.087091           0.484086            -0.433828         -0.358781   
1         -0.721366          -0.921106             0.334348          1.119904   
2          1.963431          -1.097414             2.893430         -0.886968   

   1-3CCards  4-7CCards  8-10CCards  
0  -0.603807   0.701447   -0.258409  
1   1.656157  -1.425625   -0.258409  
2  -0.603807  -1.236552    3.502878  
In [39]:
## Creating new dataframe only for labels and converting it into categorical variable
df_labels = pd.DataFrame(kmeans.labels_ , columns = list(['labels']))

df_labels['labels'] = df_labels['labels'].astype('category')
In [40]:
# Joining the label dataframe with the data frame.
df_labeled = df.join(df_labels)
In [41]:
df_analysis = (df_labeled.groupby(['labels'] , axis=0)).head(4177)  
df_analysis
Out[41]:
Avg_Credit_Limit Total_visits_bank Total_visits_online Total_calls_made 1-3CCards 4-7CCards 8-10CCards labels
0 11.512925 1 1 0 1 0 0 1
1 10.819778 0 10 9 1 0 0 1
2 10.819778 1 3 4 0 1 0 0
3 10.308953 1 1 4 0 1 0 0
5 9.903488 0 1 8 1 0 0 0
... ... ... ... ... ... ... ... ...
655 11.502875 1 10 0 0 0 1 NaN
656 11.338572 1 13 2 0 0 1 NaN
657 11.884489 1 9 1 0 0 1 NaN
658 12.055250 1 15 0 0 0 1 NaN
659 12.025749 0 12 2 0 0 1 NaN

655 rows × 8 columns

In [42]:
df_labeled['labels'].value_counts()  
Out[42]:
0    433
1    172
2     45
Name: labels, dtype: int64
In [43]:
## 3D plots of clusters
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure(figsize=(8, 6))
ax = Axes3D(fig, rect=[0, 0, .95, 1], elev=20, azim=60)
k3_model=KMeans(3)
k3_model.fit(df_std)
labels = k3_model.labels_
ax.scatter(df_std.iloc[:, 0], df_std.iloc[:, 1], df_std.iloc[:, 2],c=labels.astype(np.float), edgecolor='k')
ax.w_xaxis.set_ticklabels([])
ax.w_yaxis.set_ticklabels([])
ax.w_zaxis.set_ticklabels([])
ax.set_xlabel('Length')
ax.set_ylabel('Height')
ax.set_zlabel('Weight')
ax.set_title('3D plot of KMeans Clustering')
Out[43]:
Text(0.5, 0.92, '3D plot of KMeans Clustering')

Observation: Viewed in the 3D plot, the customer segments are clear with only few data mixed in with the distinct groups. These seem like mostly consistent clusters.

In [44]:
final_model=KMeans(3)
final_model.fit(df_std)
prediction=final_model.predict(df_std)

#Append the prediction 
df_std["KCluster"] = prediction
print("KCluster Assigned : \n")
df_std[["Avg_Credit_Limit", "KCluster"]]
KCluster Assigned : 

Out[44]:
Avg_Credit_Limit KCluster
0 1.631391 2
1 0.885986 2
2 0.885986 1
3 0.336649 1
5 -0.099385 2
... ... ...
655 1.620583 0
656 1.443893 0
657 2.030968 0
658 2.214602 0
659 2.182877 0

655 rows × 2 columns

In [45]:
df_std.boxplot(by = 'KCluster',  layout=(8,2), figsize=(15, 20))
Out[45]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE8D9CDF0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE8DC3D60>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE8DE3D30>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE8E11BE0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE8E40A90>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE8E71940>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE8E7E8E0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE8EAE790>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE8EFFA90>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE8F2EEE0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE8F67370>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE8F947C0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE8FC2C10>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE8FFD0A0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE90284F0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE9052940>]],
      dtype=object)

Observations of K-Means Clustering

  • Cluster 0 is largely customers (433, the largest customer segment) with 4-7 credit cards and the widest dispersion of average credit limits. This group is hard to discern because the plots demonstrate that, while they tend to engage the bank centers more than the other groups, they also use AllLife Bank's online platform and customer call support services. Without additional data, it would be difficult to make specific recommendations. But this is important to figure out because this is the largest customer segment. Perhaps I should have further divided the variable for 4-7 credit cards to gain more insights.
  • Cluster 1 is largely customers (172) with 1-3 credit cards and the lowest average credit limit. I suspect these customers generate the lowest amount of net revenue, and tend to use online services and engage in telephonic customer service calls with the bank.
  • Cluster 2 is largely customer (45) with 8-10 credit cards and the highest average credit limit. This group largely manages its bank relationship online.

Hierarchical Clustering

Hierarchical clustering is a method of cluster analysis which builds a hierarchy of data points as they move into a cluster or out of it. Strategies for this algorithm generally fall into two categories: agglomerative and divisive. Agglomerative is a bottom-up approach by which each observation begins as an initial cluster and then merges into clusters as they move up the hierarchy. The divisive technique is a top-down approach where there is only one cluster initially and is then split into finer cluster groups as they move down the hierarchy. This merging and splitting of clusters takes place in a greedy manner and the hierarchical algorithm yields a dendrogram which represents the nested grouping of patterns and the levels at which groupings change.

We are going to execute hierarchical clustering (with different linkages, including Ward, average, and complete) with the assistance of Silhouette scores to review all of the different combinations of clusters (2:7) and linkage methodologies. In terms of the different linkages we will review, "average linkage" means the distance between all members of one cluster is calculated to all other members in a different cluster. The average of these distances is then utilized to decide which clusters will merge. "Complete linkage means the distances between the most dissimilar members for each pair of clusters are calculated and then clusters are merged based on the shortest distance. "Ward linkage" uses the analysis of variance method to determine the distance between clusters. There are other linkage methods but we are not seeking to be exhaustive in our hierarchical clustering analysis.

We will then select the highest scores to analyze using dendograms and the cophenetic coefficient (or more precisely, the cophenetic correlation coefficient), which is a measure of how faithfully a dendrogram preserves the pairwise distances between the original unmodeled data points. We note that we could use dendograms and calculate the cophenetic coefficient for each and every n cluster and different linkage methods, but that is not time efficient for this exercise.

We will finally analyze the customer segment clusters formed by hierarchical clustering using boxplots.

In [46]:
from scipy.spatial.distance import pdist  #Pairwise distribution between data points
from scipy.cluster.hierarchy import cophenet, dendrogram, linkage
In [47]:
siliuette_list_hierarchical = []

for cluster in range(2,7):
    for linkage_method in ['ward', 'average', 'complete']:
        agglomerative = AgglomerativeClustering(linkage=linkage_method, affinity='euclidean',n_clusters=cluster).fit_predict(df_hc)
        sil_score = metrics.silhouette_score(df_hc, agglomerative, metric='euclidean')
        siliuette_list_hierarchical.append((cluster, sil_score, linkage_method, len(set(agglomerative))))
        
df_hierarchical = pd.DataFrame(siliuette_list_hierarchical, columns=['cluster', 'sil_score', 'linkage_method', 'number_of_clusters'])
In [48]:
df_hierarchical.sort_values('sil_score', ascending=False)
Out[48]:
cluster sil_score linkage_method number_of_clusters
5 3 0.587177 complete 3
7 4 0.586185 average 4
8 4 0.586185 complete 4
3 3 0.586168 ward 3
1 2 0.548550 average 2
2 2 0.548550 complete 2
10 5 0.546969 average 5
6 4 0.520138 ward 4
13 6 0.519525 average 6
0 2 0.511825 ward 2
11 5 0.487878 complete 5
12 6 0.439454 ward 6
4 3 0.436560 average 3
9 5 0.435398 ward 5
14 6 0.397386 complete 6

Based on these results, we will choose some linkage methods and cluster numbers for further analysis using dendograms and calculating the cophenet index scores.

In [49]:
#Review "Average" Methodology with different clusters
Z = linkage(df_hc, 'average', metric='euclidean')
Z.shape
Out[49]:
(654, 4)
In [50]:
# cophenet index - the closer it is to 1, the better is the clustering

Z = linkage(df_hc, metric='euclidean', method='average')
c, coph_dists = cophenet(Z, pdist(df_hc))

c
Out[50]:
0.9428877620198521

This is close to 1.

In [51]:
plt.figure(figsize=(25, 10))
dendrogram(Z)
plt.show()
In [52]:
plt.figure(figsize=(25, 10))
dendrogram(
    Z,
    truncate_mode='lastp',  # show only the last p merged clusters
    p=3,  # show only the last p merged clusters
)
plt.show()

This is good but the only thing that bothers me in my review of the dendogram is that there one cluster with very few customers (8). Let's see what results when truncate with a p value of 4.

In [53]:
plt.figure(figsize=(25, 10))
dendrogram(
    Z,
    truncate_mode='lastp',  # show only the last p merged clusters
    p=4,  # show only the last p merged clusters
)
plt.show()

Unclear whether this additional customer segment cluster is helpful, but I like the more balanced clusters. Checking with a value of 5 clusters.

In [54]:
plt.figure(figsize=(25, 10))
dendrogram(
    Z,
    truncate_mode='lastp',  # show only the last p merged clusters
    p=5,  # show only the last p merged clusters
)
plt.show()

A p value of 5 did not help as it created a customer segment of 1.

In [55]:
#Review "Ward" Methodology with different clusters
Z = linkage(df_hc, 'ward', metric='euclidean')
Z.shape
Out[55]:
(654, 4)
In [56]:
Z = linkage(df_hc, metric='euclidean', method='ward')
c, coph_dists = cophenet(Z, pdist(df_hc))

c
Out[56]:
0.8454248418635996

Not as good of score as "average" linkage, but not bad either.

In [57]:
plt.figure(figsize=(25, 10))
dendrogram(Z)
plt.show()
In [58]:
plt.figure(figsize=(25, 10))
dendrogram(
    Z,
    truncate_mode='lastp',
    p=3, 
)
plt.show()
In [59]:
plt.figure(figsize=(25, 10))
dendrogram(
    Z,
    truncate_mode='lastp',
    p=4, 
)
plt.show()
In [60]:
plt.figure(figsize=(25, 10))
dendrogram(
    Z,
    truncate_mode='lastp',
    p=5, 
)
plt.show()

Observations: I am new to this but these dendrograms are yielding surprising results. While the cophenet scare is lower for the "Ward" linkage hierarchical cluster method, the spread and balance of the 4 and 5 clusters is remarkable. I prefer the five clusters.

In [61]:
#Review "Ward" Methodology with different clusters
Z = linkage(df_hc, 'complete', metric='euclidean')
Z.shape
Out[61]:
(654, 4)
In [62]:
Z = linkage(df_hc, metric='euclidean', method='complete')
c, coph_dists = cophenet(Z, pdist(df_hc))

c
Out[62]:
0.9287006873296297

A high cophenet score as well.

In [63]:
plt.figure(figsize=(25, 10))
dendrogram(Z)
plt.show()
In [64]:
plt.figure(figsize=(25, 10))
dendrogram(
    Z,
    truncate_mode='lastp',
    p=3, 
)
plt.show()
In [65]:
plt.figure(figsize=(25, 10))
dendrogram(
    Z,
    truncate_mode='lastp',
    p=4, 
)
plt.show()
In [66]:
plt.figure(figsize=(25, 10))
dendrogram(
    Z,
    truncate_mode='lastp',
    p=5, 
)
plt.show()

Observations: These results are similar to our analysis using the "average" linkage method.

Based upon our analysis, 3 clusters and the average linkage method gives us the best overall scores using Silhouette and the cophenetic correlation coefficient. However, it is not clear to me how important the difference in cophenetic scores between average and Ward linkages are, and the customer segment distributions created by the Ward linkage method are superior when reviewing the dendograms. I am going to move forward using the Ward linkage method with 5 clusters.

In [67]:
Z = linkage(df_hc, metric='euclidean', method='ward')
In [68]:
from scipy.cluster.hierarchy import cut_tree
HC_cluster_labels = cut_tree(Z, n_clusters=5).reshape(-1, )
HC_cluster_labels
Out[68]:
array([0, 0, 1, 1, 0, 2, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 2,
       0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 2, 0, 0, 0, 2, 0, 2, 0, 2, 0, 2,
       2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 0, 0, 2,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 3, 0, 2,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 2, 2, 0, 2,
       3, 2, 0, 0, 2, 0, 0, 2, 0, 0, 0, 0, 0, 2, 0, 2, 0, 0, 0, 2, 0, 2,
       0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 2, 2, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 2, 0, 2, 2, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0,
       0, 0, 0, 0, 0, 0, 2, 0, 0, 2, 0, 0, 0, 2, 2, 0, 2, 0, 0, 0, 0, 2,
       0, 2, 2, 0, 0, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
       3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 4, 4, 4, 4, 4, 2, 2, 4,
       4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 2, 2, 4, 4, 4, 4, 4, 2, 4, 4, 4,
       4, 4, 4, 2, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4])
In [69]:
df_hc["Hierarchical_Cluster_labels"] = HC_cluster_labels
In [70]:
df_hc.head()
Out[70]:
Avg_Credit_Limit Total_visits_bank Total_visits_online Total_calls_made 1-3CCards 4-7CCards 8-10CCards KCluster Hierarchical_Cluster_labels
0 1.631391 -0.864813 -0.548851 -1.252966 1.656157 -1.425625 -0.258409 2 0
1 0.885986 -1.480522 2.535493 1.900849 1.656157 -1.425625 -0.258409 2 0
2 0.885986 -0.864813 0.136559 0.148730 -0.603807 0.701447 -0.258409 1 1
3 0.336649 -0.864813 -0.548851 0.148730 -0.603807 0.701447 -0.258409 1 1
5 -0.099385 -1.480522 -0.548851 1.550425 1.656157 -1.425625 -0.258409 2 0
In [71]:
df_hc['Hierarchical_Cluster_labels'].value_counts()  
Out[71]:
1    216
0    175
3    169
2     54
4     41
Name: Hierarchical_Cluster_labels, dtype: int64
In [72]:
df_hc.boxplot(by='Hierarchical_Cluster_labels', layout = (5,2),figsize=(20,15))
Out[72]:
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000019AEA3373A0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000019AEA297100>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000019AEB840100>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000019AEB86BF70>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000019AEB7ECE20>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000019AEB7C8CD0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE9EBEC70>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE9E90AF0>],
       [<matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE98911F0>,
        <matplotlib.axes._subplots.AxesSubplot object at 0x0000019AE98B7970>]],
      dtype=object)

Analysis of the Different Clusters

  • Cluster 0 is largely customers (175) with 1-3 credit cards and the lowest average credit limit. I suspect these customers generate the least amount of revenue. Note, however, the possibility that this group could have the highest need of credit and therefore be paying the most in interest to AllLife. Fewer credit cards could also indicate newer entrants to financial services, meaning opportunities to educate or those with intermittent financial distress.
  • Cluster 4 is largely customers (41) with 8-10 credit cards and the significantly higher average credit limit. I suspect these customers generate the highest amount of revenue. This is conjecture of course, but customers with a high number of credit cards is assumed to be using them. High Net Worth families usually have no more than 3-4 credit cards and use them as cash equivalents.
  • Clusters 1-3 are largely customers with 4-7 credit cards constitutes the majority of the customer base, provided that Cluster 1 (216) has a significantly higher average credit limit than clusters 2 (54) and 3 (169). I suspect the customers in Cluster 2 generate the steadiest, most consistent amount of net revenue for AllLife given their bank interaction costs. Cluster 3 is not as uniform as the other clusters because of the outliers present. Cluster 4 is prominent in that these customers have a slightly lower distribution of Average Credit Limit.
  • Given the importance of the customers with 4-7 credit cards, and the noise (the outliers) demonstrated by Cluster 2, I am interested if additional customer segmentation is possible if we split this variable into two variables (4-5 credit cards and 6-7 credit cards). I do not know but would love to do more analysis (but for now it is just an observation).

Discussion of the Different K-Means and Hierarchical Clustering Methods

General Discussion. Due to increasing commercialization, consumer data is increasing exponentially. When dealing with this large magnitude of data, organizations need to make use of more efficient clustering algorithms for customer segmentation. These clustering models need to possess the capability to process this enormous data effectively.

In preparation of this project, I researched the use of K-Means and hierarchical clustering for customer segmentation. I learned that in each of the above discussed clustering algorithms come with their own set of merits and distractions. The computational speed of K-Means clustering algorithm is relatively better as compared to the hierarchical clustering algorithms as the latter requires the calculation of the full proximity matrix after each iteration. K-Means clustering gives better performance for a large number of observations while hierarchical clustering has the ability to handle fewer data points. This is ostensibly due to the difficulty of visualizing a dendogram with large numbers of data points.

The major hindrance of K-Means clusterings is in the form of selecting the numbers of 'K' clusters for the K-Means process, which must be provided as an input to this non-hierarchical clustering algorithm. This limitation does not exist in the case of hierarchical clustering since it does not require any cluster centers as input. Hierarchical clustering also gives better results as compared to K-Means when a random dataset is used. The output or results obtained when using hierarchical clustering are in the form of dendrograms but the output of K-Means consists of flat-structured clusters which may be difficult to analyze. As the value of 'K' increases, the quality(accuracy) of hierarchical clustering improves when compared to K-Means clustering. As such, partitioning algorithms like K-Means are suitable for large datasets while hierarchical clustering algorithms are more suitable for small datasets. Given my lack of experience, I do not know whether this data set is considered a large dataset, but I suspect it is not.

Another takeaway from my research is that both K-Means and Hierarchical clustering have drawbacks that make them unsuitable when used individually. For business use, including in developing marketing and other business strategies, data visualization forms a major part of efficient data analysis and hierarchical clustering aids in doing so. However, when the performance aspect is taken into account, K-Means tends to deliver better results. With the advantages and disadvantages of the two techniques highlighted, it leaves me wondering whether one can combine these two clustering methodologies (multiple slow learners to form a smarter learner) could outperform the individual machine learning models.

Now let us take a look at several of the variables in the dataset in comparison of the customer segments created by each of the K-Means Clustering analysis and the Hierarchical Cluster analysis for a more specific comparison of the two different customer segments (clusters) created.

In [73]:
# scatter plot using the first two principal components to observe the cluster distribution
import matplotlib.pyplot as plt

plt.figure(figsize=(12,6),dpi=200)

plt.subplot(1,2,1)
sns.scatterplot(x='Avg_Credit_Limit' , y='Total_visits_online',data=df_std,hue='KCluster')

plt.subplot(1,2,2)
sns.scatterplot(x='Avg_Credit_Limit', y='Total_visits_online',data=df_hc,hue='Hierarchical_Cluster_labels')
Out[73]:
<matplotlib.axes._subplots.AxesSubplot at 0x19aea7591f0>
In [74]:
# scatter plot using the first two principal components to observe the cluster distribution
import matplotlib.pyplot as plt

plt.figure(figsize=(12,6),dpi=200)

plt.subplot(1,2,1)
sns.scatterplot(x='Avg_Credit_Limit' , y='Total_visits_bank',data=df_std,hue='KCluster')

plt.subplot(1,2,2)
sns.scatterplot(x='Avg_Credit_Limit', y='Total_visits_bank',data=df_hc,hue='Hierarchical_Cluster_labels')
Out[74]:
<matplotlib.axes._subplots.AxesSubplot at 0x19aeb6d27f0>
In [75]:
# scatter plot using the first two principal components to observe the cluster distribution
import matplotlib.pyplot as plt

plt.figure(figsize=(12,6),dpi=200)

plt.subplot(1,2,1)
sns.scatterplot(x='Avg_Credit_Limit' , y='Total_calls_made',data=df_std,hue='KCluster')

plt.subplot(1,2,2)
sns.scatterplot(x='Avg_Credit_Limit', y='Total_calls_made',data=df_hc,hue='Hierarchical_Cluster_labels')
Out[75]:
<matplotlib.axes._subplots.AxesSubplot at 0x19aec0c2340>

Discussion of the Different K-Means and Hierarchical Clustering Methods Continued

It is difficult to get a feel by graphically reviewing the different clusters generated K-Means and Hierarchical Clustering techniques. But, in advising AllLife Bank on our two engagements (run personalized campaigns to target new customers and potential upgrades for its service delivery model to ensure timely problem resolution), the Hierarchical Clustering seemed to be better.

While it is clear that the Silhouette scores for the K-Mean Cluster results is higher than for Hierarchical Clustering Results, and it is also clear, when using the Hierarchical Clustering different linkage methodologies (and numbers of clusters), that higher Silhouette and cophenet scores are available, but data visualization forms a major part of efficient data analysis and hierarchical clustering aids in doing so. The value of viewing the dendograms and the box plots for our chosen Hierarchical Clustering ("Ward" linkage using 5 clusters) is evident from our ability to infer more about the customer segments and think about strategy to advise AllLife Bank on its problems.

Let's also recall the three general questions posed by our problem set. First, how many different segments of customers are there? Second, how are these segments different from each other? Third, what are your recommendations to the bank on how to better market to and service these customers?

We have covered the first two questions throughout our observations above. Let's focus on recommendations, however, using our preferred Hierarchical Clustering result. Here are some thoughts on recommendations:

  • Cluster 4 is the smallest group, but what great customers they are in that they conduct their business online and not utilizing real estate or people resources. What is driving this? The need for credit card management because of the high number of credit cards? Is AllLife's website functional enough to provide these customers the best user experience possible? The answer to these questions are very important because this group (i) does not expend bank resources (call centers or bank centers) and (ii) drives revenue. Does the online engagement suggest they have higher needs for personal financial management? If so then perhaps AllLife Bank can partner with financial management websites or the AllLife Bank can deploy some online tools to help its customers and make this customer segment "sticky."
  • In addition, given this group's net revenue proposition, AllLife Bank should offer incentives for the customer segments who are already more online in their interactions, who may qualify for additional credit limits, to take additional credit cards and move into this cluster. Cluster 2 seems like the most likely group to target. In fact, the outliers in Cluster 2 for total visits online suggest this group is primed to move "upwards" to Cluster 4.
  • Cluster 2 behaves more like Cluster 4 with the exception of the high number of calls made to the bank. Why is this? They visit the bank more, and I would like to understand why? As stated above, these customers are more varied, but seem primed for additional banking services.
  • Cluster 0 is interesting and I wish we had customer identification data (age, profession, education, etc.) because I suspect this group to skew younger and/or in lower socio-economic groups. Why? This group does not go to the bank often, but it does seem to try to use online resources, but even moreso many calls the bank. This suggests to me that this group (with the lowest average credit limit) has a lot of questions regarding bank services and/or negotiating conditions and payments related to the credit card. In order to help strategize for increasing this customer segment and - perhaps more importantly - improving the customer support experience in order to keep them engaged with AllLife Bank through their customer lifecycles, I would want to better understand the nature of the calls made to the bank for this group. If the online and telephonic interactions with the bank for this group can be segmented by specific needs, then perhaps AllLife Bank could create easier-to-understand online functionality for their needs to improve the online experience, and then perhaps AllLife Bank can create call specialists to handle specific types of calls for this group. That would improve the user experience, and make the more difficult conversations with this group (payment problems) easier with better trained customer specialists.
  • Clusters 1 and 3, which vary by average credit limit, display the same bank behaviors in they eschew bank calls and minimize their interactions with online banking. However, these customer segments prefer visiting the bank centers. Understanding these segments (propose more data collection from these visits) would be key to determine the "costs" for these customers, as well as the opportunity to create a personalized approach to the banking relationship. Are their opportunities for cross-selling bank products or promotions in person that are more effective than other customer acquisition models? Could extending more credit limits or credit cards to this customer segment help with resource utilization of call support and bank centers.